Context-sensitive statistical language modeling
نویسندگان
چکیده
We present context-sensitive dynamic classes – a novel mechanism for integrating contextual information from spoken dialogue into a class n-gram language model. We exploit the dialogue system’s information state to populate dynamic classes, thus percolating contextual constraints to the recognizer’s language model in real time. We describe a technique for training a language model incorporating context-sensitive dynamic classes which considerably reduces word error rate under several conditions. Significantly, our technique does not partition the language model based on potentially artificial dialogue state distinctions; rather, it accommodates both strong and weak expectations via dynamic manipulation of a single model.
منابع مشابه
Acoustic modeling and language modeling for cantonese LVCSR
This paper describes our recent work on the development of a large-vocabulary, speaker-independent continuous speech recognition system for Cantonese (a major Chinese dialect). Both acoustic modeling and language modeling are being addressed. For acoustic modeling, we focus on right-context-dependent sub-syllable units. Tying of HMM at model as well as state level is applied based on phonetic k...
متن کاملStochastic k-Tree Grammar and Its Application in Biomolecular Structure Modeling
Stochastic context-free grammar (SCFG) has been successful in modeling biomolecular structures, typically RNA secondary structure, for statistical analysis and structure prediction. Context-free grammar rules specify parallel and nested co-occurren-ces of terminals, and thus are ideal for modeling nucleotide canonical base pairs that constitute the RNA secondary structure. Stochastic grammars h...
متن کاملStatistical Modeling of Pronunciation Variation by Hierarchical Grouping Rule Inference
In this paper, a data-driven approach to statistical modeling pronunciation variation is proposed. It consists of learning stochastic pronunciation rules. The proposed method jointly models different rules that define the same transformation. Hierarchic Grouping Rule Inference (HIEGRI) algorithm is proposed to generate this model based on graphs. HIEGRI algorithm detects the common patterns of ...
متن کاملWord Sense Disambiguation for Statistical Machine Translation
While much effort has been put in designing and evaluating Word Sense Disambiguation (WSD) models for translation in the WSD community, standard Statistical Machine Translation (SMT) systems have achieved remarkable improvements in translation quality without modeling WSD explicitly. However, inspecting SMT output suggests that SMT needs better semantic modeling to accurately translate meaning....
متن کاملImproving Language Modeling by Combining Heteogeneous Corpora
In applying statistical language modeling, directly adding training data (e.g. from website) may not always improve the performance of language models because the data may not be suitable for the application or contain errors. This paper presents a method of combining multiple heterogeneous corpora to improve the resulting language models, called compressed context-dependent interpolation schem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005